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Abstract 

Rich-club and page-club coefficients and their null models are introduced 
for directed graphs. Null models allow for a quantitative discussion of the 
rich-club and page-club phenomena. These coefficients are computed for 
four directed real-world networks: Arxiv High Energy Physics paper citation 
network, Web network (released from Google), Citation network among US 
Patents, and Email network from a EU research institution. The results show 
a high correlation between rich-club and page-club ordering. For journal 
paper citation network, we identify both rich-club and page-club ordering, 
showing that "elite" papers are cited by other "elite" papers. Google web 
network shows partial rich-club and page-club ordering up to some point 
and then a narrow declining of the corresponding normalized coefficients, 
indicating the lack of rich-club ordering and the lack of page-club ordering, 
i.e. high in-degree (PageRank) pages purposely avoid sharing links with 
other high in-degree (PageRank) pages. For UC patents citation network, we 
identify page-club and rich-club ordering providing a conclusion that "elite" 
patents are cited by other "elite" patents. Finally, for e-mail communication 
network we show lack of both rich-club and page-club ordering. We construct 
an example of synthetic network showing page-club ordering and the lack of 
rich-club ordering. 
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1. Introduction 

The study of complex systems pervades through almost all the sciences, 
from cell biology to ecology, from computer science to meteorology, to name 
just a few. A paradigm of a complex system is a network, described usually 
as a graph, where complexity may come from different sources: topological 
structure, network evolution, connection and node diversity, and/or dynam- 
ical evolution. Perhaps the most widely known graph property is the node 
degree distribution P(k), which specifies the probability of nodes having de- 
gree k in a graph. The unexpected findings that degree distributions of some 
real-world network topologies closely follow power laws stimulated further 
interest in network research 

However, node degree distribution does not describe the interconnect ivity 
of nodes with given degrees, that is, it does not provide any information on 
the total number m(ki, k 2 ) of links between nodes of degree k\ and k 2 - Joint 
degree distribution is defined as P(k\,k 2 ) = m(ki, k 2 )Li(ki, k 2 )/(2m), where 
£2) is 2 if fci = k 2 and 1 otherwise, and m is the number of links in the 
graph. Clearly joint degree distribution contains more information about 
connectivity in a graph than degree distribution: it provides information 
about 1-hop neighborhoods around a node. Given P(ki, k 2 ), we can calculate 
P(k) = (k/k) Y2k* P{k, k'), but not vice versa, where k = Y2k kP(k). 

Although looking into the high order distributions is a complex task, 
reminding us there is a price to pay, a well chosen set of metrics can give us 
a simple partial view of high-order distributions. Several such graph metrics 
that exploit joint degree distribution are: 

• Assortativity coefficient: 




• Average neighbor connectivity: 



knn{k) = Yl k'P(k'\k) 



k' 



• Local clustering: 



C(k) = 2m nn (k)/[k(k - 1)] 
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where m nn (k) is the number of links between the neighbors of /c-degree 
nodes 

• Rich-club coefficient: 



where E >k is the number of edges among the N >k nodes having degree 
higher than a given value k. 

In this paper we define two new metrics for directed graphs: rich-club coef- 
ficient and page-club coefficient. These metrics can give us a deeper level of 
understanding the complex networks as they represent different projections 
of the joint degree distribution. When network is undirected both metrics 
reduce to rich-club coefficient for undirected graphs. The rich-club phe- 
nomenon refers to the tendency of nodes with high centrality to form tightly 
interconnected communities. Since this phenomenon is one of the crucial 
properties accounting for the formation of dominant communities in real 
networks and since many real networks are directed, in this paper we suggest 
a generalization of this phenomenon to directed networks. The two metrics, 
rich-club and page-club coefficients, may not be correlated: we construct an 
example of synthetic (directed) network showing the page-club ordering and 
the lack of rich-club ordering. 

This is the outline of the paper. Section [2] overviews rich-club coefficient 
for undirected networks. In section [3] two new metrics, rich-club and page- 
club coefficients, for directed graphs are introduced. In section 0] these new 
metrics are computed for 4 networks: (A) journal papers citation network, 
(B) web graph (released from Google), (C) UC patents citation network, and 
(D) e-mail communication network. Our conclusions are presented in section 



2. Rich-club coefficient for undirected networks 

Graphs considered in this section are undirected and unweighted simple 
graphs. The rich-club coefficient, introduced by Zhou and Mondragon in the 
context of the Internet 0], refers to the tendency of high degree nodes, the 
hubs of the network, to be very well connected to each other. Denoting by 



4>(k) 




5. 
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E >k the number of edges among the N >k nodes having degree higher than a 
given value k, the rich-club coefficient is expressed as: 



4>(k) 



2E >k 



(1) 



N >k (N >k - 1) 



After some basic analytical analysis of the rich-club coefficient |3j, we can 
see that it can be expressed as a function of the joint degree distribution 



giving us a partial view of the high-order degree distribution which is far 
more economical to compute. 

In jij], the rich-club coefficient is defined in terms of nodes with rank less 
than r max where nodes are sorted by decreasing degree values and the node 
rank r denotes the position of a node on this ordered list normalized by the 
total number of nodes. Several networks are compared and a threshold value 
of 1%, i.e. the value of 0(1%) was used to differentiate the networks and pro- 
vide evidence of the rich-club phenomenon. However, a monotonic increase 
of 4>(k) does not necessarily imply the presence of the rich-club phenomenon. 
Indeed, even in the case of the ER graph - a completely random network - 
has an increasing rich-club coefficient. This implies that the increase of (j){k) 
is a natural consequence of the fact that vertices with large degree have a 
larger probability of sharing edges than low degree vertices. This feature is 
therefore imposed by construction and does not represent a signature of any 
particular organizing principle or structure, as is clear in the ER case [3|. 
The simple inspection of the <p(k) trend is therefore potentially misleading 
in the discrimination of the rich-club phenomenon, it can only be used as a 
simple statistical property to differentiate several networks in their complex 
structure. 

Therefore, in order to detect rich-club phenomenon several null models 
were proposed that normalize the basic rich-club coefficient. A null model 
was presented in Q where the rich-club is normalized by the expression 
p{k) = <p(k) / <p ran (k) where 



4>(k) 




P(fc', k") 



(2) 



N >k (N >k - 1) 





(3) 



N (k) YX> 



k'=k+l 



P(k') k, k max ^ oo (k) N 
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Figure 1: d = 6. Nodes are colored gradually according to their in-degree with white to 
black color denoting lowest to highest in-degree respectively. The additional nodes, i.e. 
nodes of the set Si are aggregated in one node with label \Si\ for simplicity. 



is the rich-club coefficient of the maximally random network (uncorrelated 
network) with the same degree distribution P(k) as the network under study. 
Operatively, the maximally random network can be thought of as the sta- 
tionary ensemble of networks visited by a process that, at any time step, 
randomly selects a couple of links of the original network and exchange two 
of their ending points (automatically preserving the degree distribution) Q. 
An actual rich-club ordering is denoted by a ratio p(k) > 1. Note that in 
sufficiently large networks and large k, <f) r an{k) becomes clearly dependent of 
k. 



3. Novel metrics for directed graphs 

In this section we consider directed graphs. A directed graph (or digraph) 
is a pair G = (V, E) of a set V, whose elements are called vertices or nodes, 
and a set E of ordered pairs of vertices, called arcs or directed edges. 
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Figure 2: Synthetic network: (a) normalized page-club coefficient <fi PR (l) / '4>ran(^) versus 
page rank (b) normalized rich-club coefficient <f> m {k) / 4>™ an {k) versus in-degree. The net- 
work shows the lack of rich-club ordering, but strong page-club ordering. 
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3.1. In-degree rich-club coefficient 

Having directed networks in mind, rich-club coefficient can be defined 
in two ways, in terms of in-degree and out-degree. The one that we are 
interested in is the in-degree rich-club coefficient defined, in a very similar 
way to (JTJ), as 

r\k) = NUN t_ iy (4) 

where E™ k is the number of directed edges among the N™ k nodes having 
in-degree higher than a given value k. Note the number 2 missing in the 
numerator since in directed full-mesh graph the number of edges is twice 
than that in the undirected graph. 

We can express the numerator in as 

r,max Umax 
K in ^in 

E >k = ^2 E k>^k", ( 5 ) 

k'=k+l k"=k+l 

where k^ ax is the maximum node in-degree of the network and E k "^ k „ denotes 
the number of edges pointing from a node of in-degree k' to a node of in- 
degree k" . Only in the case of random uncorrelated networks, E™_^ k „ takes 
the simple form 



NP in (k")k" (kj^t j Pin(k') 
Ek>^k" = 777^ ' (6) 

where (^k k '£ t ~ k ^ denotes the out-degree averaged over all nodes of in-degree 

k' and Pi n (k) denotes the probability of a node having in-degree k. At first 

sight (^k k ™ t ~ k \ may seem constant in the case of large networks representing 

web graphs, but having in mind the power-law distribution of in-degree, the 
number of nodes belonging to the same in-degree class for high in-degree 

becomes considerably small and is insufficient for converging (^k k ™ t ~ k J to 

the general (k out ). By inserting (jH]) and fl5j) into (j3j) we obtain the null model 
t (k) for uncorrelated directed networks as 



ran \ 



Umax Umax / , , / \ 

N Efc k"P m (k") £fcl (ktr k ) Pin(k') 

ch tn (k) = - - 
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Figure 3: Journal papers citation network: (a) normalized page-club coefficient 
4> (0/^rcm(0 versus page rank (b) normalized rich-club coefficient m (/c)/0*™„(fc) ver- 
sus in-degree. The network shows both page-club and rich-club ordering. Note that the 
page-club ordering is much more stronger than the rich-club ordering. 
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3.2. Page-club coefficient 

Analogously to rich-club coefficient we define a new graph metric called 
page-club coefficient which refers to the tendency of high PageRank nodes, 
i.e. most popular pages in the web, to be highly interconnected. Consider a 
modified random walker whose behavior is defined by the following two rules: 
(a) with probability 1 — q, the walker follows any outgoing link of i, chosen 
with equal probability, and (b) with probability q it moves to a generic node 
of the network (including i), chosen with equal probability. Therefore, 

We assume that each node has at least one outgoing link, and therefore the 
last equation is well defined. Thus, for each i, stationary probability pr(i), 
called also PageRank, is well defined and pr(i) > 0. The probability q is 
referred as damping factor; the damping factor adopted in real applications 
is generally small (q ~ 0.15). 

Denoting by Ep R> i the number of directed edges among the Np R> i nodes 
having PageRank value higher than a given value I, the page-club coefficient 
is expressed as: 

= N IN"' 1) (7) 

1VPR>1{MPR>1 — -U 

Please note that in the uncorrelated networks this metric converges to classi- 
cal in-degree rich-club, since in uncorrelated networks the average PageRank 
of nodes of the same in-degree class becomes linearly dependent of the in- 
degree Q, i.e. 

\ q l-q k in 
Mhn) = jf + — W) 

where pr(k in ) is the average PageRank of nodes with in-degree k in . Also, it is 
important to mention that relative fluctuations of PageRank within the same 
class decrease as the in-degree increases. Analogous to an appropriate 
null model for page-club can be defined 

^ p R(l) = — x 

V ^ (hn)N PR>l {N PR>l -l) 



ran 



l+l ' l+l 
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where P pr (l) denotes the probability of a node having PageRank I, and 
(kin~/ an( ^ (^out/ are average node in-degree and average node out-degree, 
averaged over all nodes of PageRank class I. 

3.3. Synthetic network 

Generally, all the results of the networks used in this paper show a high 
correlation between page-club and rich-club coefficients, see section HI but one 
should not derive a general conclusion from this observation. In this section 
we generate a synthetic network showing increase of page-club coefficient and 
decrease of rich-club coefficient. Consider the directed tree graph G = (V, E) 
where V represents the node set, and E, the edge set. Let this tree be with 
depth d, where the depth of a node i is the length of the path from the root 
to the node, and the depth of the tree is the maximal length of all such paths. 
Further, let G be a labeled graph, where each node can have one of several 
labels (depending on its depth in the tree), i.e. V = V0UV2 . . .UV^. We denote 
by I Vi\ the number of depth i nodes. Assume that \Vi\ — i + 1, i — 0, . . . , d, 
i.e. the number of nodes increments as we move to a larger depth in the tree. 
We compose the edge set of the ordered pairs of vertices {{Vik, Vi-\^) \ k G 
(1, |Vi_i|),i G {0,d)} U {Vi t \ Vi \ x Vi-i I i G {0,d)} where V i}k denotes the k-th 
node in V^. In other words, depth i nodes propagate their PageRank score 
to depth i — 1 nodes, and furthermore, all nodes in the same class (depth) 
have equal PageRank values. By this construction, all the nodes, except the 
leaves, have in-degree 2. To change this property, we add additional set of 
nodes Si connecting to the set Vi with the edge set E{ = Si x Vi, i.e. we 
increment the in-degree of the nodes in the set Vi by \Si\. Note that the 
in-degree of these additional nodes is 0. So, we tweak the in-degree of the 
nodes in the class Vi, ky., by the following rule: beside the leaf nodes, we 
start with a k l y = 2 and increment the in-degree by one in every even depth, 
whereas for odd depths, we start with a high in-degree in lower depths and 
decrement the in-degree as we go in higher depths. A more formal definition 
would be 

( 0, i = d 

k™=< 1 + 2, i = 2n,neN (8) 
{ L^J +2, i = 2n + l,n€N 

Such graph with depth d — 6 is shown in Fig. [TJ What we want to achieve 
is the lack of rich-club ordering where nodes with high in-degree connect to 
nodes with low in-degree and vise versa. Further, a direct consequence of the 
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tree structure is that nodes with high page-rank will propagate their score to 
the successor nodes and therefore positive page-club ordering should arise. 
The results are shown in Fig. [2] for a generated graph of depth 50 with 
1926 nodes. We observe the lack of rich-club ordering, but strong page-club 
ordering, as expected. Also, we stress that the top 612 in-degree nodes are 
not sharing any links. Thus, for such networks, the results of the analysis of 
the inter-connectivity of nodes, would clearly depend on the definition of the 
"rich" nodes (in-degree or PageRank). 



4. Real networks: results and discussions 

The network data used in this paper consists of four networks 

• Arxiv High Energy Physics paper citation network (cit-HepPh): Di- 
rected, Temporal, Labeled network with 34,546 nodes and 421,578 
edges; 

• Web graph from Google (web-Google): Directed network with 875,713 
nodes and 5,105,039 edges; 

• Citation network among US Patents (cit-Patents): Directed, Temporal, 
Labeled network with 3,774,768 nodes and 16,518,948 edges; and 

• Email network from a EU research institution (email- EuAll): Directed 
network with 265,214 nodes and 420,045 edges. 

For each network we compute normalized rich-club and page-club coefficients 
Pin{k) = (/> in (k)/cl>™ n (k), p PR (l) = Pi? (/)/<OZ). We stress that p in (k) and 
Ppr(1) may, in some cases, be undefined. In the following, we discuss only 
the quantity Pi n (k) since the discussion for Ppr(1) is exactly same. Pi n (k) is 
undefined when its denominator is equal to zero, <fi r n an {k) = 0. We rewrite 



Lin 
ran \ 



in _ inlinks >k outlinks >k 

^ran{K) — , , , , A i i \ \ J ) 



\E\N* k (N&-l) 

where inlinksyk {outlinks > k) denotes all the in-links {out-links) arriving to 
(departing from) nodes that have in-degree greater than k and \E\ denotes 
the number of directed edges in the network. 
We consider several cases: 
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Figure 4: Web graph (released in 2002 by Google): (a) normalized page-club coefficient 
(f> (0/^rcm(0 versus P a g e rank, and (b) normalized rich-club coefficient </> m (fc)/</>™ n (fc) 
versus in-degree. The page-club and rich-club ordering are observed only to some point, 
then the network shows the lack of page-club ordering and the lack of rich-club ordering. 
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Figure 5: US Patents citation network: (a) normalized page-club coefficient c/> PR (l) / (f>ran(^) 
versus page rank (b) normalized rich-club coefficient m (£;)/(/>*"„ (fc) versus in-degree. The 
network shows both page-club and rich-club ordering. Note that the page-club ordering 
is much more stronger than the rich-club ordering. 
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• When N™ k = 1 or N™ k = 0, i.e. when we have a single node or no 
nodes in the "club". 

• When inlinks > k = 0. Note that this case should not happen in practice 
since in the two special cases of 0J,an(O) an d ^Tani^in X ) we generally 
have a positive number of in-links. 

• When outlinksyk = 0. This case happens in tree graphs, such as 
citation networks, where the top in-degree nodes (roots) have no out- 
links. 

We handle all these cases by assigning Pi n (k) a value 1. It is important to 
stress that Pi n {k) can have a value zero, i.e. its denominator can be well 
defined, thus having a positive number of out-links departing the "club", but 
no links are shared within the "club". 

We also stress that for the PageRank computation we used the damping 
factor q = 0.15 for all the networks. We also used q = 0.5 for the journal 
citation network as proposed by Q but no significant changes are observed, 
so these results are omitted. 

4-1. Papers citation network 

The Arxiv High Energy Physics paper citation network is formed from the 
e-print arXiv dataset and covers all the citations within a dataset of 34,546 
papers with 421,578 edges. If a paper i cites paper j, the graph contains a 
directed edge from i to j. If a paper cites, or is cited by, a paper outside the 
dataset, the graph does not contain any information about this. The data 
covers papers in the period from January 1993 to April 2003 (124 months). It 
begins within a few months of the inception of the arXiv, and thus represents 
essentially the complete history of its HEP-PH section. The graph has an 
exponential degree distribution and a tree structure. 

In Fig. [3] we show the normalized rich-club and page-club coefficients and 
identify both rich-club and page-club ordering providing a conclusion that 
"elite" papers are cited by other "elite" papers. If we say that "elite" papers 
are written by "elite" scientists and those scientists decide to reference "elite" 
papers, i.e. papers written by other "elite" scientists than this result coin- 
cides with previous findings j3| i.e. it indicates existence of an "oligarchy" 
of highly influential and mutually communicating scientists. Note the differ- 
ence between page-club and rich-club. The page-club ordering is much more 
stronger than the rich-club which can be explained by having in mind the 
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Figure 6: Email communication network: (a) normalized page-club coefficient 
4> (0/^rcm(0 versus page rank (b) normalized rich-club coefficient </> m (fc)/</>™„(fc) versus 
in-degree. The lack of page-club ordering and the lack of rich-club ordering are observed. 
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tree structure in citation network, i.e. older papers which are higher in the 
hierarchy have higher PageRank retrieved from all the successors indepen- 
dently of the number of their direct successors, i.e., their in-degree. We also 
point that the top 8 PageRank nodes (roots) have no out-links, thus having 
a page-club value of one, whereas the top in-degree nodes have a positive 
rich-club ordering. 

4-2. Google Web graph 

In the web graph released in 2002 by Google as a part of Google Program- 
ming Contest, nodes represent web pages and directed edges represent hy- 
perlinks between them. The graph consists of nearly one million (1,000,000) 
nodes and over five million (5,000,000) edges following a power-law degree 
distribution. 

In Fig. H] we show the normalized rich-club and page-club coefficients 
where we identify partial rich-club (page-club) ordering up to some point 
(the middle layer) and then a narrow declining of the coefficients, which 
show the lack of rich-club ordering and the lack of the page-club ordering. 
In other words, in this networks high-degree (PageRank) pages purposely 
avoid sharing links with other high-degree (PageRank) pages. Some sort of 
competitiveness among strong pages could be a possible explanation of this 
phenomenon. Also note the uncorrelated property of the network, therefore 
explaining the high similarity between page-club and rich-club coefficient jij. 
We stress that the top 24 (20) in-degree (PageRank) nodes are not sharing 
any links between them beside their positive number of out-links, therefore 
the zero values of the rich-club and page-club coefficients. 

4-3. US Patents citation network 

The U.S. patent dataset, maintained by the National Bureau of Economic 
Research, spans 37 years (January 1, 1963 to December 30, 1999), and in- 
cludes all the utility patents granted during that period, totaling about four 
million (4,000,000) patents. The citation graph includes all citations made 
by patents granted between 1975 and 1999, totaling 16,522,438 citations. For 
the patents dataset there are 1,803,511 nodes for which we have no informa- 
tion about their citations (we only have the in-links). 

In Fig. [5b we show the normalized rich-club coefficient: the rich-club 
ordering is observed up to some point, and after that point, the ordering 
quickly decreases to zero, where the top 27 in-degree nodes are not sharing 
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any links between them. Fig. [5h shows the normalized page-club coeffi- 
cient: one could identify page-club ordering providing the conclusion that 
top PageRank patents are cited by top PageRank patents. Note the much 
stronger page-club than rich-club ordering, generally, because of the tree 
structure. 

4-4- Email communication network 

The network of email communication of a large European research insti- 
tution contains all incoming and outgoing email of the research institution 
for the period of October 2003 to May 2005 (18 months). Given a set of 
email messages, each node corresponds to an email address. A directed edge 
between nodes i and j was created if i sent at least one message to j. The 
network consists of 265214 nodes and 420045 edges. 

Fig. [6] shows the normalized rich-club and page-club coefficients. The 
lack of the page-club ordering and the lack of the rich-club ordering for 
this network could be explained by observing that scientists are working in 
research groups where each group has one to few "elite" scientists managing 
the group, where communication between the "elite" scientists from different 
groups is reduced to a minimum. The top 9 (6) in-degree (PageRank) nodes 
are not sharing any links between them beside their positive number of out- 
links, therefore the zero values of the rich-club and page-club coefficients. 

5. Conclusion 

In this paper two new metrics for directed graphs are introduced, namely 
the normalized rich-club coefficient and the normalized page-club coefficient. 
For different directed graphs these two coefficients are computed. The results 
have indicated a high correlation between page-club and rich-club coefficients 
except for the synthetic network, for which the coefficients have opposite be- 
havior. In general, beside the high correlation observed in several real net- 
works, these metrics are not same. Detecting rich-club phenomenon often 
used to indicate the dominance of an "oligarchy" of "rich" and mutually com- 
municating entities. However, this analysis clearly depends of the definition 
of "rich" nodes. The page-club coefficient annotates nodes with high PageR- 
ank as the "popular" nodes, thus, in networks where PageRank emerges as a 
natural metric for distinguishing between popular and unpopular nodes, one 
should use page-club to indicate the emergence of an "oligarchy" formed by 
"elite" nodes. 
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